Co-Betweenness: A Pairwise Notion of Centrality 
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Betweenness centrality is a metric that seeks to quantify a sense of the importance of a vertex 
in a network graph in terms of its 'control' on the distribution of information along geodesic paths 
throughout that network. This quantity however does not capture how different vertices participate 
together in such control. In order to allow for the uncovering of finer details in this regard, we intro- 
duce here an extension of betweenness centrality to pairs of vertices, which we term co -betweenness, 
that provides the basis for quantifying various analogous pairwise notions of importance and control. 
More specifically, we motivate and define a precise notion of co-betweenness, we present an efficient 
algorithm for its computation, extending the algorithm of [l[ in a natural manner, and we illustrate 
the utilization of this co-betweenness on a handful of different communication networks. From these 
real- world examples, we show that the co-betweenness allows one to identify certain vertices which 
are not the most central vertices but which, nevertheless, act as important actors in the relaying 
and dispatching of information in the network. 
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I. INTRODUCTION 

In social network analysis, the problem of determining 
the importance of actors in a network has been studied 
for a long time (see, for example, [2(]). It is in this context 
that the concept of the centrality of a vertex in a network 
emerged. There are numerous measures that have been 
proposed to numerically quantify centrality which differ 
both in the nature of the underlying notion of vertex 
importance that they seek to capture, and in the manner 
in which that notion is encoded through some functional 
of the network graph. See [3], for example, for a recent 
review and categorization of centrality measures. 

Paths - as the routes by which flows (e.g., of informa- 
tion or commodities) travel over a network - are funda- 
mental to the functioning of many networks. Therefore, 
not surprisingly, a number of centrality measures quan- 
tity importance with respect to the sharing of paths in 
the network. One popular measure is betweenness cen- 
trality. First introduced in its modern form by [4], the 
betweenness centrality is essentially a measure of how 
many geodesic (ie., shortest) paths run over a given ver- 
tex. In other words, in a social network for example, 
the betweenness centrality measures the extent to which 
an actor "lies between" other individuals in the network, 
with respect to the network path structure. As such, it 
is a measure of the control that actor has over the distri- 
bution of information in the network. 

The betweenness centrality - as with all other central- 
ity measures of which we are aware - is defined specifi- 
cally with respect to a single given vertex. In particular, 
vertex centralities produce an ordering of the vertices in 
terms of their individual importance, but do not provide 



insight into the manner in which vertices act together in 
the spread of information across the network. Insight of 
this kind can be important in presenting an appropriately 
more nuanced view of the roles of the different vertices, 
beyond their individual importance. A first natural ex- 
tension of the idea of centrality in this manner is to pairs 
of vertices. 

In this paper, we introduce such an extension, which 
we term the co-betweenness centrality, or simply the co- 
betweenness. The co-betweenness of two vertices is essen- 
tially a measure of how many geodesic paths are shared 
by the vertices, and as such provides us with a sense of the 
interplay of vertices across the network. For example, the 
co-betweenness alone quantifies the extent to which pairs 
of vertices jointly control the distribution of information 
in the network. Alternatively, a standardized version of 
co-betweenness produces a well-defined measure of cor- 
relation between flows over the two vertices. Finally, an 
alternative normalization quantifies the extent to which 
one vertex controls the distribution of information to an- 
other vertex. 

This paper is organized as follows. In Section [III 
we briefly review necessary technical background. In 
Section HIH we provide a precise definition for the 
co-betweenness and related measures, and motivate 
each in the context of an Internet communication net- 
work. An algorithm for the efficient computation of co- 
betweenness, for all pairs of vertices in a network, is 
sketched in Section HVl and its properties are discussed. 
In Section [Vj we further illustrate our measures using 
two social networks whose ties are reflective of commu- 
nication. Some additional discussion is provided in Sec- 
tion I VII Finally, a formal description of our algorithm, 
as well as pseudo-code, may be found in the appendix. 
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II. BACKGROUND 

Let Q = (V, £) denote an undirected, connected net- 
work graph with n v vertices in V and n e edges in £ . 
A walk on £/, from a vertex vo to another vertex V£, 
is an alternating sequence of vertices and edges, say 
{^o, ei, vi, . . . , V£-i, ei^vi}, where the endpoints of ei are 
{vi-i,Vi}. The length of this walk is said to be £. A 
trail is a walk without repeated edges, and a path, a trail 
without repeated vertices. A shortest path between two 
vertices u, v G V is a path between u and v whose length 
i is a minimum. Such a path is also called a geodesic and 
its length, the geodesic distance between u and v. In the 
case that the graph Q is weighted i.e., there is a collection 
of edge weights {w e } ee g, where w e > 0, shortest paths 
may be instead defined as paths for which the total sum 
of edge weights is a minimum. In the material that fol- 
lows, we will restrict our exposition primarily to the case 
of unweighted graphs, but extensions to weighted graphs 
are straightforward. For additional background of this 
type, see, for example, the textbook 

Let a st denote the total number of shortest paths that 
connect vertices s and t (with a ss = 1), and let cr s t(v) 
denote the number of shortest paths between s and t that 
also run over vertex v. Then we define the betweenness 
centrality of a vertex v as a weighted sum of the number 
of paths through v, 



s ,tev\{v} 

Note that this definition excludes the shortest paths that 
start or end at v. However, in a connected graph we 
will have cr s t(v) = cr s t whenever s = v or t = v, so 
the exclusion amounts to removing a constant term that 
would otherwise be present in the betweenness centrality 
of every vertex. 

As an illustration, which we will use throughout this 
section and the next, consider the network in Figure HJ 
This is the Abilene network, an Internet network that is 
part of the Internet2 project [? ], a research project de- 
voted to development of the 'next generation' Internet. 
It serves as a so-called 'backbone' network for universities 
and research labs across the United States, in a manner 
analogous to the federal highway system of roads. We use 
this network for illustration because, as a technological 
communication network, the notions of connectivity, in- 
formation, flows, and paths are all explicit and physical, 
and hence facilitate our initial discussion of betweenness 
and co-betweenness. Later, in Section El we will illus- 
trate further with two communication networks from the 
social network literature. 

The information traversing this network takes the form 
of so-called 'packets', and the packets flow between ori- 
gins and destinations on this network along paths strictly 
determined according to a set of underlying routing pro- 
tocols (Technically, the Abilene network is more accu- 
rately described by a directed graph. But, given the fact 




FIG. 1: Graph representation of the physical topology of the 
Abilene network. Nodes represent regional network aggrega- 
tion points (so-called 'Points-of-Presence' or PoP's), and are 
labeled according to their metropolitan area, while the edges 
represent systems of optical transportation technologies and 
routing devices. 



that routing is typically symmetric in this network, we 
follow the Internet2 convention of displaying Abilene us- 
ing an undirected graph.). A reasonable first approxima- 
tion of the routing of information in this network is with 
respect to a set of unique shortest paths. In this case, 
the betweenness B(v) of any given vertex v G V will be 
exactly equal to the number of shortest paths through 
v. The vertices in Figure [1] correspond to metropolitan 
regions, and have been laid out roughly with respect to 
their true geographical locations. Intuitively and accord- 
ing to earlier work on centrality in spatial networks [7|, 
one might suspect that vertices near the central portion 
of the network, such as Denver or Indianapolis, have 
larger betweenness, being likely forced to support most of 
the flows of communication between east and west. We 
will see in Section HIT1 that such is indeed the case. 

Until recently, standard algorithms for computing be- 
tweenness centralities B(v) for all vertices in a network 
had 0{nl) running times, which was a stumbling block to 
their application in large-scale network analyses. Faster 
algorithms now exist, such as those introduced in 
which have running time of 0(n v n e ) on unweighted net- 
works and 0(n v n e + n^logn v ) on weighted networks, 
with an 0(n v + n e ) space requirement. These improve- 
ments derive from exploiting a clever recursive relation 
for the partial sums X^ev (J s,t(v)/o~ Sj t- As we will see, 
the need for efficient algorithms is even more important 
in the case of the co-betweenness, and we will make simi- 
lar usage of recursions in developing an efficient algorithm 
for computing this quantity. 



III. CO-BETWEENNESS 

We extend the concept of vertex betweenness centrality 
to pairs of vertices u and v by letting cr s t(u, v) denote the 
number of shortest paths between vertices s and t that 
pass through both u and v, and defining the vertex co- 
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FIG. 2: Graph representation of the betweenness and co- 
bet weenness values for the Abilene network. Vertices are in 
proportion to their betweeness. The width of each link is 
drawn in proportion to the co-betweenness of the two vertices 
incident to it. 



betweenness as 



C(u,v) 



CF st (u,v) 



s,tev\{u,v} 



<*at 



(2) 



Thus co-betweenness gives us a measure of the number 
of shortest paths that run through both vertices u and v. 

To gain some insight into the relation between be- 
tweenness and co-betweenness, consider the following sta- 
tistical perspective. Recall the Abilene network described 
in the previous section, and suppose that x s ,t is a mea- 
sure of the information (i.e., Internet packets) flowing 
between vertices s and t in the network. Similarly, let y v 
be the total information flowing through vertex v. Next, 
define x to be the n p x 1 vector of values x s j, where n p 
is the total number of pairs of vertices exchanging infor- 
mation, and y, to be the n v x 1 vector of values y v . A 
common expression modeling the relation between these 
two quantities is simply y = i?x, where R is an n v x n p 
matrix (i.e., the so-called 'routing matrix') of O's and l's, 
indicating through which vertices each given routed path 
goes. 

Now if x is considered as a random variable, with un- 
corr elated elements, then its covariance matrix is simply 
equal to the n p x n p identity matrix. The elements of y, 
however, will be correlated, and their covariance matrix 
takes the form Q = RR T ', by virtue of the linear relation 
between y and x. Importantly, note that the diagonal 
elements of Q are the betweenness' B(v). Furthermore, 
the off-diagonal elements are the co-betweenness' C(u, v). 
When shortest paths are not unique, the same results 
hold if the matrix R is expanded so that each shortest 
path between a pair of vertices s and t is afforded a sepa- 
rate column, and the non-zero entries of each such column 
has the value cr~\, rather than 1. In this case, R may be 
interpreted as a stochastic routing matrix. 

To illustrate, in Figure O we show a network graph 
representation of the matrix Q for the Abilene network. 
The vertices are again placed roughly with respect to 
their actual geographic location, but are now drawn in 



proportion to their betweenness. Edges between pairs of 
vertices now represent non-zero co-betweenness for the 
pair, and are also drawn with a thickness in proportion 
to their value. A number of interesting features are ev- 
ident from this graph. First, we see that, as surmised 
earlier, the more centrally located vertices tend to have 
the largest betweenness values. And it is these vertices 
that typically are involved with the larger co-betweenness 
values. Since the paths going through both a vertex s and 
a vertex t are a subset of the paths going through either 
one or the other, this tendancy for large co-betweenness 
to associate with large betweenness should not be a sur- 
prise. Also note that the co-betweenness values tend to 
be smaller between vertices separated by a larger geo- 
graphical distance, which again seems intuitive. 

Somewhat more surprising perhaps, however, is the 
manner in which the network becomes disconnected. The 
Seattle vertex is now isolated, as there are no paths that 
route through that vertex - only to and from. Addi- 
tionally, the vertices Houston, Atlanta, and Washington 
now form a separate component in this graph, indicat- 
ing that information is routed on paths running through 
both the first two and the last two, but not through all 
three, and also not through any of these and some other 
vertex. Overall, one gets the impression of information 
being routed primarily over paths along the upper por- 
tion of the network in Figure [TJ A similar observation 
has been made in [8], using different techniques. 

While the raw co-betweenness values appear to be 
quite informative, one can imagine contexts in which it 
would be useful to compare co-betweenness' across pairs 
of vertices in a manner that adjusts for the unequal be- 
tweenness of the participating vertices. The value 



C corr (u,v) 



C{u,v) 



y/B{u)B{v) 



(3) 



is a natural candidate for a standardized version of the co- 
betweeness in (J2j), being simply the corresponding entry 
of the correlation matrix deriving from Q = RR T . 

Figure [3] shows a network graph representation of the 
quantities in C corr for the Abilene network, with edges 
again drawn in proportion to the values and vertices now 
naturally all drawn to be the same size. Much of this 
network looks like that in Figure O The one notable 
exception is that the magnitude of the values between 
the three vertices in the lower subgraph component are 
now of a similar order to most of the other values in the 
other component. This fact may be interpreted as in- 
dicating that among themselves, adjusting for the lower 
levels of information flowing through this part of the net- 
work, these vertices are as strongly 'correlated' as many 
of the others. 

The co-betweenness may also be used to define a di- 
rected notion of the strength of pairwise relationships. 
Let 



C(u\v) = 



C{u,v) 



(4) 
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FIG. 3: Graph representation of the standardized co- 
bet weenness values C corr for the Abilene network. Vertices 
are all drawn with equal size. Edge width is drawn in propor- 
tion to the standardized co-betweenness of the two vertices 
indicent to it. 




FIG. 4: Directed graph representation of the conditional be- 
tweenness values C(u\v) (given by Eq. (4)) for the Abilene 
network. Edges are drawn with width in proportion to their 
value of C(u\v) and indicate how one vertex (at the head) 
controls the flow of information through another (at the tail). 



denote the relative proportion of shortest paths through 
v that also go through u. This quantity may be inter- 
preted as a measure of the control that vertex v has over 
the information that passes through vertex u. Alterna- 
tively, under uniqueness of shortest paths, if from among 
the set of shortest paths through v one is chosen uni- 
formly at random, the value C(u\v) is the probabilty that 
the chosen path will also go through u. We call C(u\v) 
the conditional betweenness of u, given v. Note that, in 
general, C(u\v) ^ C(v\u). 

Figure [H shows a graph representation of the values 
C(u\v) for the Abilene network. Due to the asymmetry 
of these values in u and v, arcs are used, rather than 
edges, with an arc from v to u corresponding to C{u\v). 
The thickness of the arcs is proportional to these values, 
and is therefore indicative of the control exercised on the 
vertex at the tail by the vertex at the head. For improved 



visualization, we have used a simple circular layout for 
the vertices. Examination of this figure shows symmetry 
in the relationships between some pairs of vertices, but 
a strong asymmetry between most others. For example, 
vertices like Indianapolis, which were seen previously to 
have a large betweenness, clearly exercise a strong degree 
of control over almost any other vertices with which they 
share paths. More interestingly, note that certain ver- 
tices that are neighbors in the original Abilene network 
have more symmetric relationships than others. The con- 
ditional betweenness' for Atlanta and Washington, DC, 
are fairly similar in magnitude, while those for Los An- 
geles and Sunnyvale are quite dissimilar, with the latter 
evidently exercising a noticeably greater degree of control 
over the former. 



IV. COMPUTATION OF CO-BETWEENNESS 

We discuss here the calculation of the co-betweenness 
values C(u,v) in ([2]), for all pairs (u,v), from which the 
other quantities in ([3j) and (|4j) follow trivially. At a first 
glance, it would appear that an algorithm of O(n^) run- 
ning time is necessary, given that the number of ver- 
tex pairs grows as the square of the number of vertices. 
Such an implementation would render the notion of co- 
betweenness infeasible to implement in any but network 
graphs of relatively modest size. However, exploiting 
ideas similar to those underlying the algorithms of [l| 
for calculating the betweenness' B(v), a decidedly more 
efficient implementation may be obtained, as we now de- 
scribe briefly. Details may be found in the appendix. 

Our algorithm for computing co-betweenness involves 
a three-stage procedure for each vertex v G V. In the first 
stage, we perform a breadth- first traversal of the network 
graph (5, to quickly compute intermediary quantities such 
as a SVl the number of shortest paths from a source s to 
each other vertex v in the network; in the process we 
form a directed acyclic graph that contains all shortest 
paths leading from vertex s. In the second stage, we it- 
erate through each vertex in order of decreasing distance 
from s and compute a score S s (v) for each vertex that is 
related to its contribution to the co-betweenness. These 
contributions are then aggregated in a depth-first traver- 
sal of the directed acyclic graph, which is carried out in 
the third and final stage. 

In order to compute the number of shortest paths a sv 
in the first stage, we note that the number of shortest 
paths from s to a vertex v is the sum of all shortest 
paths to each parent of v in the directed acyclic graph 
rooted at s, namely, 

0~sv = (Jst ' ^ 

tep s (v) 

In the case of an undirected graph, this can be computed 
in the course of a breadth- first search with a running time 
ofO(n e ). 



5 



In the second stage, we compute S s (v) using the recur- 
sive relation established in Theorem 6 of 



S.(v) 



wec s (w) 



® sw 



(6) 



where c s (v) denotes the set of child vertices of v in the 
directed acyclic graph rooted at s. 

Finally, in the third stage, we compute the co- 
betweennesses by interpreting the relation 



C(u,v)= 

sev\{u,v} 



S s (v) 



a sv (u) 



(7) 



as assigning a contribution of to C(u, v) for each of 
the a sv (u) shortest paths to v that run through u. We 
accumulate these contributions at each step of the depth- 



S s (v) 



to 



first traversal when we visit a vertex v by adding 
C(u,v) for every ancestor u of the current vertex v. 

Our proposed algorithms exploit recursions analogous 
to those of [l| to produce run-times that are in the worst 
case 0(riy), but in empirical studies were found to vary 
like 0(n v n e -\- n^ p \ogn v ) in general, or 0(n^ p log n v ) 
in the case of sparse graphs. Here p is related to the total 
number of shortest paths in the network and seems to lie 
comfortably between 0.1 and 0.5 in our experience. In the 
case of unique shortest paths, it may be shown rigorously 
that the running time reduces to 0(n v n e +nl logn v ), and 
0{v? v log n v ) if the network is sparse as well as 'small- 
world' (i.e., with diameter of size 0(\ogn v )). See the 
appendix for details. 



V. ADDITIONAL ILLUSTRATIONS 

We provide in this section additional illustration of 
the use of co-betweenness, based on two other networks 
graphs. Both graphs originally derive from social net- 
work analyses in which one goal was to understand the 
flow of certain information among actors. 

A. Michael's Strike Network 

Our first illustration involves the strike dataset of Q, 
which is also analyzed in detail in Chapter 7 of [10| . New 
management took over at a forest products manufactur- 
ing facility, and this management team proposed certain 
changes to the compensation package of the workers. The 
changes were not accepted by the workers, and a strike 
ensued, which was then followed by a halt in negotia- 
tions. At the request of management, who felt that the 
information about their proposed changes was not be- 
ing communicated adequately, an outside consultant an- 
alyzed the communication structure among 24 relevant 
actors. 

The social network graph in Figure [5] represents the 
communication structure among these actors, with an 
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FIG. 5: Original strike- group communication network 
of [9|. Three subgroups are represented in this net- 
work: younger, Spanish-speaking employees (black vertices), 
younger, English-speaking employees (gray vertices), and 
older, English-speaking employees (white vertices). The two 
union negotiators, Sam and Wendle, are indicated by asterix' 
next to their names. Edges indicate that the two incident ac- 
tors communicated at some minimally sufficient level of fre- 
quency about the strike. 



edge between two actors indicating that they commu- 
nicated at some minimally sufficient level of frequency 
about the strike. Three subgroups are present in the net- 
work: younger, Spanish-speaking employees (black ver- 
tices), younger, English-speaking employees (gray ver- 
tices), and older, English-speaking employees (white ver- 
tices). In addition, the two union negotiators, Sam and 
Wendle, are indicated by asterix' next to their names. 
It is these last two that were responsible for explaining 
the details of the proposed changes to the employees. 
When the structure of this network was revealed, two ad- 
ditional actors - Bob and Norm - were approached, had 
the changes explained to them, which they then discussed 
with their colleagues, and within two days the employees 
requested that their union representatives re-open nego- 
tiations. The strike was resolved soon thereafter. 

That such a result could follow by targeting Bob and 
Norm is not entirely surprising, from the perspective of 
the network structure. Both are cut- vertices (i.e., their 
removal would disconnect the network), and are inci- 
dent to edges serving as bridges (i.e., their removal simi- 
larly would disconnect the network) from their respective 
groups to at least one of the other groups. 

Co-betweenness provides a useful alternative charac- 
terization, one which explicitly emphasizes the patterns 
of communication in the network, as shown in Figure [6l 
As with Figure [2j vertices (now arranged in a circular lay- 
out) are drawn in proportion to their betweenness, and 
edges, to their co-betweenness. Bob and Norm clearly 
have the largest betweenness values, followed by Ale- 
jandro, who we remark also is a cut- vertex, but inci- 
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FIG. 6: Co-betweenness for the strike- group communication 
network. Actors located apart from the network, in the cor- 
ners, are isolated under this representation, as they have zero 
betweenness and hence no co-betweenness with any other ac- 
tors. (Note: Isolated vertices are drawn to have unit diameter, 
and not in proportion to their (zero) betweenness.) 



FIG. 7: Conditional co-betweenness for the older English- 
speaking actors in the strike- group communication network. 




dent to a bridge to a smaller subnetwork than the other 
two (i.e., four younger Spanish-speakers, in comparison 
to nine younger English-speakers and 11 older English- 
speakers, for Bob and Norm, respectively). The impor- 
tance of these three actors on the communication process 
is evident from the distinct triangle formed by their large 
co-betweenness values. Note that for the two union rep- 
resentatives, the co-betweenness values suggest that Sam 
also plays a non-trivial role in facilitating communica- 
tion, but that Wendle is not well-situated in this regard. 
In fact, Wendle is not even connected to the main com- 
ponent of the graph, since his betweenness is zero (as is 
also true for six other actors). 




FIG. 8: Karate club network of [Tl|]. The gray vertices repre- 
sent members of one of the two smaller clubs and the white 
vertices represent members who went to the other club. The 
edges are drawn with a width proportional to the number of 
situations in which the two members interacted. 



A plot of the standardized co-betweenness C corr shows 
similar patterns overall, and we have therefore not in- 
cluded it here. The conditional betweenness C(u\v) for 
this network primarily shows most of the actors with 
large arcs pointing to Bob and Norm, and much smaller 
arcs pointing the opposite direction. This pattern further 
confirms the influence that these two actors can have on 
the other actors in the communication process. How- 
ever, there are also some interesting asymmetrical rela- 
tionships among the actors with smaller parts. For ex- 
ample, consider Figure [71 which shows the conditional 
betweenness among the older English-speaking employ- 
ees. Ultrecht, for example, clearly has potential for a 
large amount of control on the communication of infor- 
mation passing through Russ, and similarly, Karl, on that 
through John. 



B. Zachary's Karate Club Network 

Our second illustration uses the karate club dataset 
of [ll|. Over the course of a couple of years in the 1970s, 
Zachary collected information from the members of a uni- 
versity karate club, including the number of situations 
(both inside and outside of the club) in which interac- 
tions occurred between members. During the course of 
this study, there was a dispute between the club's admin- 
istrator and the principal karate instructor. As a result, 
the club eventually split into two smaller clubs of approx- 
imately equal size — one centered around the administra- 
tor and the other centered around the instructor. 

Figure [8] displays the network of social interactions be- 
tween club members. The gray vertices represent mem- 
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FIG. 9: Co-betweenness for the karate club network. Actors 
in the upper- left and lower-right corners, separated from the 
connected component, are isolated due to zero betweenness. 
The two actors in the lower right-hand corner (i.e., a5 and 
all) have non-zero betweenness, but are bridges, in the sense 
that they only serve to connect to other vertices, and hence 
have zero co-betweenness. (Note: The vertices for actors with 
zero betweenness are drawn to have unit diameter, for pur- 
poses of visibility.) 



bers of one of the two smaller clubs and the white ver- 
tices represent members who went to the other club. The 
edges are drawn with a width proportional to the number 
of situations in which the two members interacted. The 
graph clearly shows that the original club was already 
polarized into two groups centered about actors 1 and 
34, who were the key players in the dispute that split the 
club in two. 

The co-betweenness for this network is shown in Fig- 
ure [9l As in Figure El the layout is done using an energy 
minimization algorithm. Again, as in our other examples, 
the co-betweenness entries are dominated by a handful 
of larger values. As might be expected, actors 1 and 34, 
who were at the center of the dispute, have the largest be- 
tweenness centralities and are also involved in the largest 
co-betweenness'. More interesting, however, is the fact 
that these two actors have a large co-betweenness with 
each other - despite not being directly connected in the 
original network graph. This indicates that they are nev- 
ertheless involved in connecting a large number of other 
pairs - probably through key intermediaries such as ac- 
tors 3 and 32. These latter two actors, while certainly 
not cut- vert ices, nevertheless seem to operate like con- 
duits between the two groups, quite likely due to their 
direct ties to both actor 1 and either of actors 33 and 
34, the latter of which are both central to the group of 
white vertices. The co-betweenness for actors 1 and 32 
is in fact the largest in the entire network. 

Also of potential interest are the 14 vertices that are 
isolated from the network in the co-betweenness repre- 
sentation. Some of these vertices, such as actor 8, have 



strong social interactions with certain other actors (i.e., 
with actors 1, 2, 3 and 4), but evidently play a peripheral 
role in the communication patterns of the network, as ev- 
idenced by their lack of betweenness. Alternatively, there 
are the vertices like those representing actors 5 and 11, 
who have some betweenness centrality but nonetheless 
find themselves cut off from the connected component in 
the co-betweenness graph. An examination of the def- 
inition of the co-betweenness tells us that such vertices 
must be bridge- vertices, in the sense that they only serve 
to connect pairs of other vertices, i.e., they only occur in 
the middle of paths of length two. 



VI. DISCUSSION 

We introduced in this paper the notion of co- 
betweenness as a natural and interpretable metric for 
quantifying the interplay between pairs of vertices in a 
network graph. As we discussed in different real world 
examples, this quantity has several interesting features. 
In particular, unlike the usual betweenness centrality 
which orders the vertices according to their importance in 
the information flow on the network, the co-betweenness 
gives additional information about the flow structure 
and the correlations between different actors. Using this 
quantity, we were able to identify vertices which are not 
the most central ones, but which however play a very im- 
portant role in relaying the information and which there- 
fore appear as crucial vertices in the control of the infor- 
mation flow. 

In principle, of course, one could continue to define 
higher-order analogues, involving three or more vertices 
at a time. But the computational requirements asso- 
ciated with calculating such analogues would soon be- 
come burdensome. In the case of triplets of vertices, one 
can expect algorithms analogous to those presented here 
to scale no better than O(n^). Additionally, we remark 
that, in keeping with the statistics analogy made in Sec- 
tion [Till it is likely that the pairwise 'correlations' picked 
up by co-betweenness captures to a large extent the more 
important elements of vertex interplay in the network, 
with respect to shortest paths. 

Following the tendancies in the statistical physics lit- 
erature on complex networks [H, Qjl, it can be of 
some interest to explore the statistical properties of co- 
betweenness in large-scale networks. Some work in this 
direction may be found in [14], where co-betweenness and 
functions thereof were examined in the context of stan- 
dard network graph models. The most striking proper- 
ties discovered were certain basic scaling relations with 
distance between vertices. 

On a final note, we point out that, while our discus- 
sion here has been focused on co-betweenness for pairs 
of vertices in unweighted graphs, we have also devel- 
oped the analogous quantities and algorithms for ver- 
tex co-betweenness on weighted graphs and for edge co- 
betweenness on unweighted and weighted graphs. Also 
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see [H , where a result is given relating edge betweenness 
to the eigen- values of the matrix edge-betweenness 'co- 
variance' matrix, defined in analogy to the matrix Q in 
Section [Till 

This appendix contains details specific to the pro- 
posed algorithm for computing co-betweeness, including 
a derivation of key expressions, a rough analysis of algo- 
rithmic complexity. The pseudo-codes can be found at 
the address [151 ] . Actual software implementing our al- 
gorithm, written in the Matlab software enviroment, is 
available at 161. 



APPENDIX A: DERIVATION OF KEY 
EXPRESSIONS 

Central to our algorithm are the expressions in ((6]) and 
([7j), the derivations for which we present here. Before 
doing so, however, we need to introduce some definitions 
and relations. First note that a simple combinatorial 
argument will show that 



<7st(v) = 



cr sv cr vt if d(s, t) = d(s, v) + d(v, t), 







and 



a st (u : v) = < 



otherwise, 



' (?su °uv °vt if d(s, t) = d(s, u), 
+d(u,v) + d(v,t), 

(Tsv CFvu (Tut if d(s, t) = d(s, v), 

+d(v, u) + d(u, i), 
otherwise. 



(Al) 



(A2) 



For the the sake of notational simplicity, we will assume, 
without loss of generality, that 



d(s, u) < d(s, v). 



(A3) 



for the remainder of this discussion. 

The remaining quantities we need to introduce are no- 
tions of the path-dependency of vertices. In the spirit of 
[l| , we define the "dependency" of vertices s and t on the 
vertex pair (u, v) as 



(T st (u,v) 



(Tst 



(A4) 



and we define the dependency of s alone on the pair of 
vertices (u, v) as 



6 a (u,v)= ^2 s 8t(u,v)= 



tev\{u,v} 



tev\{u,v} 



(T s t 



(A5) 

Similarly, we define the pair-wise dependency of s and t 
on a single vertex v as 



S a t(v) = 



(T s t(v) 

(Tst 



(A6) 



and the dependency of s alone on v as 



tev\{v} 



tev\{v} 



(T s t(v) 

(Tst 



(A7) 



Note that unlike [1], we exclude t — v from the sum in 
(|A7|) . Two relations that follow immediately from these 
definitions, combined with (|Alj) and (|A2|) . are 

(T st (u,v) = (T su (Tuv(Tvt 

= a sv (u)a v t 

08V (U) 



(Tsv (Tyt 



= S sv (u)a st (v), 



(A8) 



and 



c ( \ (T s t{u,v) S sv (u)a st (v) 
S 8t {u,v) = S sv (u)S st (v). 



(Tst (Tst 

These two relations allow us to show that 



(A9) 



(A10) 



5 s (u,v)= E $st(u,v) 
tev\{u,v} 

= Yl M«)M«) bydSS) (All) 

t€V\{u,v} 

= 5 sv (u)5 s (v) (A12) 



since S su (v) = by (|A3|) and using Eq. (|A8|) . we obtain 

(A13) 



x ( \ S *( v ) ( \ 

S {U,V) = (Tsv(U) 



We use this result to re-express the co-betweenness de- 
fined in ([2]) as 



s,tev\{u,v} 



(A14) 



E E M«.«) (A15) 

sev\{u,v} \tev\{u,v} J 

E 5 s(u,v) (A16) 
sev\{u,v} 

6.(v) 



= E 



a sv (u). 



(A17) 



sev\{u,v} 



Lastly, to establish the recursive relation in (|6]), note 
that for a child vertex w G c s (v) every path to v gives 
rise to exactly one path to w by following the edge (v,w). 
This means that 



(T S w(v) = (Tsv for w G c 3 (v), 



(A18) 



and that 



5sw ^ v ) = 1^1 = ^L iorwGc s (v). (A19) 



(Tsw (Tsw 
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Also note that for t = w we have 

Sst(w) = 1. (A20) 

This allows us to decompose S s (v) in essentially the same 
manner as namely, 



x — > 

E 

tev\{v} 




(A21) 


tev\{v} 


/ J 3 t{V,W) 

wec s (v) 




£ 

wGc s (v) 


tev\{v} 


(A23) 


E 

wGc s (v) 




by dSSJ 

(A24) 


E 

wec s (v) 


^ V *€V\{« lU »} / 


(A25) 



Using (|A19|) and (|A20j) . we then obtain 

S s (v)= —0- + S.(w)). (A26) 

Where the last equality is due to the fact that since w is 
a child of v we have cr sv (w) = and thus S sv (w) = 0. 



since we touch each edge at most twice when we com- 
pute the dependency scores S 3 (v), the running time for 
the second stage is also 0{n e ). Since we repeat each 
stage for each vertex in the network, the first two stages 
have a running time of 0(n v n e ). The running time for 
the depth-first traversal, that occurs during the third 
stage, depends on the number and length of all short- 
est paths in the network. Overall, we visit every short- 
est path once and compute a co-betweenness contribu- 
tion for each edge of every shortest path. For 'small- 
world' networks i.e., networks with an 0(\ogn v ) diame- 
ter, we must compute 0(a • logn v ) contributions, where 
a = veV <j uv is the total number of shortest paths 
in the network. So the overall running time for the al- 
gorithm is 0(n v n e + alogn v ). Empirical evidence sug- 
gests that the upper bound for the average ^ Xluev Guv 
ranges from n^ 19 to n^" 32 for common random graph 
models, and at worst has been seen to reach n^ 62 in the 
case of a network of airports. (In the latter case, there 
were extreme fluctuations in zCnev a uv so the total 
number of shortest paths, a, might be much smaller than 
n v (n v — 1) times this upper bound.) This suggests a run- 
ning time of 0(n v n e + n 2+p \ogn v ), though it is an open 
question to show this rigorously. In the case of sparse 
networks, where n e ~ n v , this reduces to a running time 
of 0(n 2 +^logn,). 



APPENDIX B: ALGORITHMIC COMPLEXITY 



Standard breadth-first search results put the running 
time for the first stage of our algorithm at 0(n e ), and 
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